Sequence Comparisons via Algorithmic Mutual Information

نویسنده

  • Aleksandar Milosavljevic
چکیده

One of the main problems in DNA and protein sequence comparisons is to decide whether observed similarity of two sequences should be explained by their relatedness or by mere presence of some shared internal structure, e.g., shared internal tandem repeats. The standard methods that are based on statistics or classical information theory can be used to discover either internal structure or mutual sequence similarity, but cannot take into account both. Consequently, currently used methods for sequence comparison employ "masking" techniques that simply eliminate sequences that exhibit internal repetitive structure prior to sequence comparisons. The "masking" approach precludes discovery of homologous sequences of moderate or low complexity, which abound at both DNA and protein levels. As a solution to this problem, we propose a general method that is based on algorithmic information theory and minimal length encoding. We show that algorithmic mutual information factors out the sequence similarity that is due to shared internal structure and thus enables discovery of truly related sequences. We extend that recently developed algorithmic significance method (Milosavljević & Jurka 1993) to show that significance depends exponentially on algorithmic mutual information.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Probabilistic Sufficiency and Algorithmic Sufficiency from the point of view of Information Theory

‎Given the importance of Markov chains in information theory‎, ‎the definition of conditional probability for these random processes can also be defined in terms of mutual information‎. ‎In this paper‎, ‎the relationship between the concept of sufficiency and Markov chains from the perspective of information theory and the relationship between probabilistic sufficiency and algorithmic sufficien...

متن کامل

Mutual Dimension and Random Sequences

If S and T are infinite sequences over a finite alphabet, then the lower and upper mutual dimensions mdim(S : T ) and Mdim(S : T ) are the upper and lower densities of the algorithmic information that is shared by S and T . In this paper we investigate the relationships between mutual dimension and coupled randomness, which is the algorithmic randomness of two sequences R1 and R2 with respect t...

متن کامل

Standardized Mutual Information for Clustering Comparisons: One Step Further in Adjustment for Chance

Mutual information is a very popular measure for comparing clusterings. Previous work has shown that it is beneficial to make an adjustment for chance to this measure, by subtracting an expected value and normalizing via an upper bound. This yields the constant baseline property that enhances intuitiveness. In this paper, we argue that a further type of statistical adjustment for the mutual inf...

متن کامل

A New Entropy Based Model for the Detection of Correlated Mutations in Multiple Sequence Alignments

The recent advents of complete genome sequencing provide a tremendous amount of data for researches about the structural basis of the function of proteins. However, the shear amount of data is both a blessing and a curse. In order to facilitate the utilization of this information, numerous algorithmic analysis procedures have been developed to identify functionally important residues. In this p...

متن کامل

An operational characterization of mutual information in algorithmic information theory

We show that the mutual information, in the sense of Kolmogorov complexity, of any pair of strings x and y is equal, up to logarithmic precision, to the length of the longest shared secret key that two parties, one having x and the complexity profile of the pair and the other one having y and the complexity profile of the pair, can establish via a probabilistic protocol with interaction on a pu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Proceedings. International Conference on Intelligent Systems for Molecular Biology

دوره 2  شماره 

صفحات  -

تاریخ انتشار 1994